109 research outputs found

    Fisher: a program for the detection of H/ACA snoRNAs using MFE secondary structure prediction and comparative genomics – assessment and update

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The H/ACA family of small nucleolar RNAs (snoRNAs) plays a central role in guiding the pseudouridylation of ribosomal RNA (rRNA). In an effort to systematically identify the complete set of rRNA-modifying H/ACA snoRNAs from the genome sequence of the budding yeast, <it>Saccharomyces cerevisiae</it>, we developed a program – Fisher – and previously presented several candidate snoRNAs based on our analysis <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p> <p>Findings</p> <p>In this report, we provide a brief update of this work, which was aborted after the publication of experimentally-identified snoRNAs <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> identical to candidates we had identified bioinformatically using Fisher. Our motivation for revisiting this work is to report on the status of the candidate snoRNAs described in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, and secondly, to report that a modified version of Fisher together with the available multiple yeast genome sequences was able to correctly identify several H/ACA snoRNAs for modification sites not identified by the snoGPS program <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. While we are no longer developing Fisher, we briefly consider the merits of the Fisher algorithm relative to snoGPS, which may be of use for workers considering pursuing a similar search strategy for the identification of small RNAs. The modified source code for Fisher is made available as supplementary material.</p> <p>Conclusion</p> <p>Our results confirm the validity of using minimum free energy (MFE) secondary structure prediction to guide comparative genomic screening for RNA families with few sequence constraints.</p

    Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning.</p> <p>Results</p> <p>In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods.</p> <p>Conclusion</p> <p>Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.</p

    Discovery and characterisation of dietary patterns in two Nordic countries. Using non-supervised and supervised multivariate statistical techniques to analyse dietary survey data

    Get PDF
    This Nordic study encompasses multivariate data analysis (MDA) of preschool Danish as well as pre- and elementary school Swedish consumers. Contrary to other counterparts the study incorporates two separate MDA varieties - Pattern discovery (PD) and predictive modelling (PM). PD, i.e. hierarchical cluster analysis (HCA) and factor analysis (using PCA), helped identifying distinct consumer aggregations and relationships across food groups, respectively, whereas PM enabled the disclosure of deeply entrenched associations. 17 clusters - here defined as dietary prototypes - were identified by means of HCA in the entire bi-national data set. These prototypes underwent further processing, which disclosed several intriguing consumption data relationships: Striking disparity between consumption patterns of Danish and Swedish preschool children was unveiled and further dissected by PM. Two prudent and mutually similar dietary prototypes appeared among each of two Swedish elementary school children data subsets. Dietary prototypes rich in sweetened soft beverages appeared among Danish and Swedish children alike. The results suggest prototype-specific risk assessment and study design

    A Study in RNA Bioinformatics : Identification, Prediction and Analysis

    No full text
    Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels. This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences. We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods. We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server. Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites. Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos

    A Study in RNA Bioinformatics : Identification, Prediction and Analysis

    No full text
    Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels. This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences. We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods. We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server. Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites. Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos

    A Study in RNA Bioinformatics : Identification, Prediction and Analysis

    No full text
    Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels. This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences. We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods. We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server. Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites. Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos

    New techniques for analysing RNA structure

    No full text
    i matematik med inriktning mot bioinformatik som framläggs för offentlig gransknin

    Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos

    Get PDF
    Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure–function relationships in sequence families and superfamilies

    Structural Modeling Extends QSAR Analysis of Antibody-Lysozyme Interactions to 3D-QSAR

    Get PDF
    This work shows that quantitative multivariate modeling is an emerging possibility for unraveling protein-protein interactions using a combination of designed mutations with sequence and structure information. Using this approach, it is possible to stereochemically determine which residue properties contribute most to the interaction. This is illustrated by results from modeling of the interaction of the wild-type and 17 single and double mutants of a camel antibody specific for lysozyme. Linear multivariate models describing association and dissociation rates as well as affinity were developed. Sequence information in the form of amino acid property scales was combined with 3D structure information (obtained using molecular mechanics calculations) in the form of coordinates of the α-carbons and the center of the side chains. The results show that in addition to the amino acid properties of the mutated residues 101 and 105, the dissociation rate is controlled by the side-chain coordinate of residue 105, whereas the association is determined by the coordinates of residues 99, 100, 105 (side chain), 111, and 112. The great difference between the models for association and dissociation rates illustrates that the event of molecular recognition and the property of binding stability rely on different physical processes
    • …
    corecore